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Abstract 

The dramatic growth in the number of application domains that naturally generate probabilistic, uncertain data 
has resulted in a need for efficiently supporting complex querying and decision-making over such data. In this 
paper, we present a unified approach to ranking and top-k query processing in probabilistic databases by viewing 
it as a multi-criteria optimization problem, and by deriving a set of features that capture the key properties of a 
probabilistic dataset that dictate the ranked result. We contend that a single, specific ranking function may not suffice 
for probabilistic databases, and we instead propose two parameterized ranking functions, called PRF^ and PRF e , 
that generalize or can approximate many of the previously proposed ranking functions. We present novel generating 
functions-based algorithms for efficiently ranking large datasets according to these ranking functions, even if the 
datasets exhibit complex correlations modeled using probabilistic and/xor trees or Markov networks. We further 
propose that the parameters of the ranking function be learned from user preferences, and we develop an approach to 
learn those parameters. Finally, we present a comprehensive experimental study that illustrates the effectiveness of 
our parameterized ranking functions, especially PRF e , at approximating other ranking functions and the scalability 
of our proposed algorithms for exact or approximate ranking. 

1 Introduction 

Recent years have seen a dramatic rise in the number of applications domains that naturally generate uncertain data and 
that demand aid for complex, decision-support queries over them. In part, this has been due to the increasing prevalence 
of applications such as information retrieval [16], data integration and cleaning J2][l2), text analytics l25ll21l . social 
network analysis H) etc. At the same time, large-scale instrumentation of nearly every aspect of our world using 
sensor monitoring infrastructures has resulted in an abundance of uncertain, noisy data ifTTl Rjll. 

By their very nature, many of these applications require support for ranking over large scale datasets. For instance, 
consider a House Search application, where a user is searching for a house using a real estate sales dataset: Housef id, 
price, size, zipcode, . ..). However, such a dataset, which may be constructed by crawling and combining data from 
multiple sources, is inherently uncertain and noisy. In fact, the houses that the user prefers the most, are also the most 
likely to be sold by now. We may denote such uncertainty by associating with each advertisement a probability that 
it is still valid. However, incorporating such uncertainties into the returned answers is a challenge, considering the 
complex interplay between the score of a house by itself, and the probability that the advertisement is still valid. 

The importance of this natural problem has led to much work on ranking in probabilistic databases in recent 
years. That prior work (which we review in more detail later) has proposed many different functions for combining 
the scores and the probabilities. We begin with a systematic exploration of these issues by recognizing that ranking 
in probabilistic databases is inherently a multi-criteria optimization problem, and by deriving a set of features, the 
key properties of a probabilistic dataset that influence the ranked result. We empirically illustrate the diverse and 
conflicting behavior of several natural ranking functions, and argue that a single specific ranking function may not 
be appropriate to rank different uncertain databases that we may encounter in practice. Furthermore, different users 
may weigh the features differently, resulting in different rankings over the same dataset. We then define a general and 
powerful ranking function, called PRF, that allows us to explore the space of possible ranking functions. We discuss its 
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relationship to previously proposed ranking functions, and also identify two specific parameterized ranking functions, 
called PRF 1 ^ and PRF e , as being interesting. The PRF 1 ^ ranking function is essentially a linear, weighted ranking 
function that resembles the scoring functions typically used in information retrieval, web search etc. l22l [28l 151 [101 . 
We observe that such a function may not be suitable for ranking large-scale probabilistic databases due to its high 
running time, and instead propose PRF e , which uses a single parameter, and can approximate previously proposed 
ranking functions for probabilistic databases very well. 

We then develop novel algorithms based on generating functions to efficiently rank the tuples in a probabilistic 
dataset using any PRF ranking function. Our algorithm can handle a probabilistic dataset with arbitrary correlations; 
however, it is particularly efficient when the probabilistic database contains only mutual exclusivity and/or mutual co- 
existence correlations (called probabilistic and/xor trees ||30l ). Our results apply to some of the previously proposed 
ranking functions as well (one of our results was also independently obtained by Yi et al. [36]). Our main contributions 
can be summarized as follows: 

• We develop a framework for learning ranking functions over probabilistic databases by identifying a set of key 
features and by proposing several parameterized ranking functions. 

• We present novel algorithms based on generating functions that enable highly efficient processing of top-k queries 
over very large datasets. Our key algorithm is an 0(n log(n)) algorithm (0(n) if the dataset is pre-sorted by score) 
for evaluating a PRF e function over datasets with low correlations (more specifically, constant height probabilistic 
and/xor trees). 

• We present a polynomial time algorithm for computing the top-k answers for a correlated dataset, where the cor- 
relations are represented using a bounded-treewidth Markov network. The algorithm we present is actually for 
computing the probability that a given tuple is ranked at a given position across all the possible worlds, and is of 
independent interest. 

• We develop a novel, DFT-based algorithm for approximating an arbitrary weighted ranking function using a linear 
combination of PRF e functions. 

• We present a comprehensive experimental study over several real and synthetic datasets, comparing the behavior of 
the ranking functions and the effectiveness of our proposed algorithms. 

Outline: We begin with a brief discussion of the related work (Section[2]i. In Section[3] we review our probabilistic 
database model and the prior work on ranking in probabilistic databases, and propose two parameterized ranking func- 
tions. In Section|4] we present our generating functions-based algorithms for ranking. We then present an approach 
to approximate different ranking functions using our parameterized ranking functions, and to learn a ranking function 
from user preferences (Section[5]l. We then briefly sketch an extension of our ranking algorithms to correlated datasets 
where the correlations are modeled using Markov networks (Section [6]). We conclude with an experimental study in 
Section [7] 

2 Related Work 

There has been much work on managing probabilistic, uncertain, incomplete, and/or fuzzy data in database systems 
(see e.g. l29l[T6l l6ll8l[35l[3l). With a rapid increase in the number of application domains where uncertain data arises 
naturally, such as data integration, information extraction, sensor networks, pervasive computing etc., this area has seen 
renewed interest in recent years ifTTl . This work has spanned a range of issues from theoretical development of data 
models and data languages, to practical implementation issues such as indexing techniques, and several research efforts 
are underway to build systems to manage uncertain data (e.g. MYSTIQ 0, Trio 03, ORION (6), MayBMS 0). 
The approaches can be differentiated between tuple-level uncertainty, where "existence" probabilities are attached to 
the tuples of the database, and attribute-level uncertainty, where (possibly continuous) probability distributions are 
attached to the attributes. The proposed approaches differ further based on whether they consider correlations or not. 
Most work in probabilistic databases has either assumed independence lfl6l l8l or has restricted the correlations that can 
be modeled |29l |2l [32l . More recently, several approaches have been presented that allow representation of arbitrary 
correlations 1331131. 
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Figure 1: Example of a probabilistic database which contains automatically captured information about speeding cars. 
Tuple i2 and £3 (similarly, and t§) are mutually exclusive. The corresponding and/xortree compactly encodes these 
correlations. 

The area of ranking and top-k query processing has also seen much work in databases (see Ilyas et al. Il24ll for a 
survey). More recently, several researchers have considered top-k query processing in probabilistic databases. Soliman 
et al. 11341 defined the problem of ranking over probabilistic databases, and proposed two ranking functions to combine 
tuple scores and probabilities. Yi et al. ll36l present improved algorithms for the same ranking functions. Zhang and 
Chomicki | J7) present a desiderata for ranking functions. Ming Hua et al. j23l recently presented a different approach 
called probabilistic threshold queries. Finally, Cormode et al. (7) also present a semantics of ranking functions and 
a new ranking function called expected rank. We will review these ranking functions in detail in next section. There 
has also been work on top-k query processing in probabilistic databases where the ranking is by the result tuple 
probabilities (i.e., probability and score are identical) 13T1 . The main challenge in that work is efficient computation 
of the probabilities, whereas we assume that the probability and score are either given or can be computed easily. 



3 Problem Formulation 

We begin with defining our model of a probabilistic database, called probabilistic and/xor tree IT301 . that allows cap- 
turing several common types of correlations. We then review the prior work on top-k query processing in probabilistic 
databases, and argue that a single specific ranking function may not capture the intricacies of ranking with uncertainty. 
We then present our parameterized ranking functions, PRF U and PRF e . 

3.1 Probabilistic Database Model 

We use the prevalent possible worlds semantics for probabilistic databases IS). We denote a probabilistic relation 
with tuple uncertainty by Dt, where T denotes the set of tuples (see Section l4~4l for extensions of our algorithms 
to attribute uncertainty). The set of all possible worlds is denoted by PW — {pw\,pw2, ....,pw n }. Each tuple 
ti £ T is associated with an existence probability Pr(£;) and a score score(ii), computed based on a scoring function 
score : T — > M. Usually score (i) is computed based on the tuple attribute values and measures the relative user 
preference for different tuples. In a deterministic database, tuples with higher scores should be ranked higher. We use 
Tpw ■ T — » {1, . . . U {00} to denote the rank of the tuple t in a possible world pw according to score. If t does 
not appear in the possible world pw, we let r pw (t) = 00. We say t\ ranks higher than t<x in the possible world pw if 
fpw{ti) < r pw (t2). For each tuple t, we define a random variable r(t) which denotes the rank of t in Dt- In other 
words, Pr(r(t) = k) is the total probability of the possible worlds where t is ranked at position k. 

Probabilistic And/Xor Tree Model: Our algorithms can handle arbitrarily correlated relations with correlations mod- 
eled using Markov networks (Section |6j. However, in most of this paper, we focus on the probabilistic and/xor tree 
model, introduced in our prior work 11301 . that can capture only a more restricted set of correlations, but admits highly 
efficient ranking algorithms. More specifically, an and/xor tree captures two types of correlations: (1) mutual exclusiv- 
ity (denoted @ (xor)) and (2) mutual co-existence (® {and)). Two events satisfy the mutual co-existence correlation 
if, in any possible world, either both events occur or neither occurs. Similarly two events are mutually exclusive if 
there is no possible world where both happens. 

Definition 1 A probabilistic and/xor tree T is a tree where each leaf is a singleton tuple and each inner node has a 
mark, @ or ®. For each (V) node u and each of its children v, there is a nonnegative value p(u, v) associated with 
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the edge (u, v). Moreover, we require v ) p( u ' v ) — 1- ^ et be the subtree rooted at v and V\, . . . , Vi be v's 

children. The subtree T v inductively defines a random subset S v of its leaves by the following independent process: 
• If v is a leaf S v = {v}. 

S Vi with prob. p(v, Vi) 



„ . „ , „ f S v . with prob. 

It v is a M node, then b v — < A 
J w I otherwise 



• v is a @ node, then S v = UiS Vi 

x-tuples (which can be used to specify mutual exclusivity correlations between tuples) correspond to the special 
case where we have a tree of height 2, with a ® node as the root and only @ nodes in the second level. Figure Q] 
shows an example of an and/xor tree that models the data from a traffic monitoring application l34l . where the tuples 
represent automatically captured traffic data. For example, the leftmost @ node indicates t± is present with probability 
.4 and the second @ node dictates that exactly one of £2 and £3 should appear. The topmost ® node tells us the random 
sets derived from these @ nodes coexist. 

Probabilistic and/xor trees significantly generalize x-tuples |f32l[36l , block-independent disjoint tuples model, and 
p-or-sets O, and can in fact represent a finite set of arbitrary possible worlds ]30) . The correlations captured by 
such a tree can be represented by probabilistic c-tables l20l and provenance semirings |[T9l . However, that does not 
directly imply an efficient algorithm for ranking. We note that no prior work on ranking in probabilistic databases has 
considered more complex correlations than x-tuples. 



3.2 Ranking over Probabilistic Data: Definitions and Prior Work 

The interplay between probabilities and scores complicates the semantics of ranking in probabilistic databases. This 
was observed by Soliman et al. 1341 . who first considered this problem and presented two definitions of top-k queries 
in probabilistic databases. Several other definitions of ranking have been proposed since then. We briefly review the 
ranking functions we consider in this work. 

• Uncertain Top-k (U-Topj 1341 : Here the query returns the fc-tuple set that appears as the top-k answer in most 
possible worlds (weighted by the probabilities of the worlds). 

• Uncertain Rank-k (U-RankJ ll34l : At each rank i, we return the tuple with the maximum probability of being at 
the i'th rank in all possible worlds. In other words, U-Rank returns: 

{t*,i = 1, 2, ..,k}, where t* = argmax t {Pr(r(t) = i)). 

• Probabilistic Threshold Top-k (PT(h)) J23): The original definition of a probabilistic threshold query asks for all 
tuples with probability of being in top-h answer larger than a pre-specified threshold, i.e., all tuples t such that 
Pr(r(<) < h) > threshold. For consistency with other ranking definitions, we slightly modify the definition and 
instead ask for the k tuples with the largest Pr(r(t) < h) values. 

• Expected Ranks (Exp-RankJ Q: The tuples are ranked in the increasing order by: J2 pw ePW ^ r (P w ) r pw(t), 
where r pw (t) is defined to be \pw\ if t ^ pw. 

• Expected Score (E-ScoreJ: Another natural ranking function, also considered by J7), is simply to rank the tuples 
by their expected score, Pr(£)score(i). 

Normalized Kendall Distance: To compare different ranking functions or criteria, we need a distance measure to 
evaluate the closeness of two top-k answers. We use the prevalent Kendall tau distance defined for comparing top- 
k answers for this purpose fl4l . It is also called Kemeny distance in the literature and is considered to have many 
advantages over other distance metrics 1131 . Let IZi and IZ2 denote two full ranked lists, and let JCi and IC2 denote the 
top-k ranked tuples in 1Z\ and IZ2 respectively. Then Kendall tau distance between ICi and IC2 is defined to be: 
dis(/Ci,/C2) = J2(i j)eP(Kx K 2 ) ^(*)i)> wnere P{K-ItK-2) is the set of all unordered pairs of K,\ U K2, K(i,j) = 1 
if it can be inferred from JCi and JC2 that i and j appear in opposite order in the two full ranked lists IZi and 7^2, 
otherwise K(i,j) = 0. Intuitively the Kendall distance measures the number of inversions or flips between the two 
rankings. For ease of comparison, we divide the Kendall distance by k 2 to obtain normalized Kendall distance, which 
always lies in [0, 1]. 
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A higher value of the Kendall distance indicates a larger disagreement between the two top-k lists. It is easy to see 
that if the Kendall distance between two top-k answers is 8, then the two answers must share at least 1 — Vo fraction 
of tuples (so if the distance is 0.09, then the top-k answers share at least 70%, and typically 90% or more tuples). The 
distance is zero only if two top-k answers are completely disjoint. 
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Table 1 : Normalized Kendall distance among various ranking functions for two datasets 

Comparing Ranking Functions: We compared the top-100 answers returned by the five ranking functions with each 
other using the normalized Kendall distance, for two datasets with 100,000 independent tuples each (see Section|7]for 
a description of the datasets). Table[T]shows the results of this experiment. As we can see, the five ranking functions 
return wildly different top-k answers for the two datasets, with no obvious trends. For the first dataset, Exp-Rank 
behaves very differently from all other functions, whereas for the second dataset, Exp-Rank happens to be quite close 
to E-Score. However both of them deviate largely from U-Top, PT(h) and U-Rank. The behavior of E-Score is very 
sensitive to the dataset, especially the score distribution: it is close to PT(h) and U-Rank for the first dataset, but far 
away from all of them in the second dataset (by looking into the results, it shares less than 15 tuples with the Top-100 
answers of the others). We observed similar behavior for other datasets, and for datasets with correlations. 

This simple experiment illustrates the issues with ranking in probabilistic databases - although several of these 
definitions seem natural, the wildly different answers they return indicate that none of them could be the "right" 
definition. 

We also observe that in large datasets, Exp-Rank tends to give very high priority to a tuple with a high probability 
even if it has a low score. In our synthetic dataset Syn-IND- 100,000 with expected size as 50000, t2 (the tuple with 
2nd highest score) has probability approximately 0.98 and iiooo (the tuple with 1000th highest score) has probability 
0.99. The expected ranks of ti and tiooo are approximately 10000 and 6000 respectively, and hence iiooo is ranked 
above t-z even though tiooo is only slightly more probable. 

3.3 Parameterized Ranking Functions 

Ranking in uncertain databases is inherently a multi-criteria optimization problem, and it is not always clear how to 
rank two tuples that dominate each other along different axes. Consider a database with two tuples t\ (score = 100, 
Pr(ti) = 0.5), and t% (score = 50, Pr^) = 1.0). Even in this simple case, it is not clear whether to rank t\ above t% 
or vice versa. This is an instance of the classic risk-reward trade-off, and the choice between these two options largely 
depends on the application domain and/or user preferences. 

We propose to follow the traditional approach to dealing with such tradeoffs, by identifying a set of features, by 
defining a parameterized ranking function over these features, and by learning the parameters (weights) themselves 
using user preferences l22l l28l l5l [Toll . To achieve this, we propose a family of ranking functions, parameterized by 
one or more parameters, and design algorithms to efficiently find the top-k answer according to any ranking function 
from these families. Our general ranking function, PRF, directly subsumes some of the previously proposed ranking 
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functions, and can also be used to approximate other ranking functions. Moreover, the parameters can be learned from 
user preferences, which allows us to adapt to different scenarios and different application domains. 



Features: Although it is tempting to use the tuple probability and the tuple score as the features, a ranking function 
based on just those two will be highly sensitive to the actual values of the scores; further, such a ranking function will 
be insensitive to the correlations in the database, and hence cannot capture the rich interactions between ranking and 
possible worlds. 

Instead we propose to use the following set of features: for each tuple t, we have n features, Pr(r(i) = = 
1, • • • , n, where n is the number of tuples in the database. This set of features succinctly captures the possible worlds. 
Further, correlations among tuples, if any, are naturally accounted for when computing the features. We note that in 
most cases, we do not explicitly compute all the features, and instead design algorithms that can directly compute the 
value of the overall ranking function. 

Ranking Functions: Next we define a general ranking function which allows exploring the trade-offs discussed above. 

Definition 2 Let w:TxN->Cfea weight function, where C is the set of complex numbers. The parameterized 
ranking function (PRF), : T — > C in its most general form is defined to be: 



pw.tGpw 

= ^2 yV(*, i)Pr(pw A r pw (t) = i) 

pw.t^pw i>0 

= 5>(M) ■ Pr(r(t) =i). 

i>a 

A top-k query returns k tuples with the highest \T U \ values. 

In most cases, a; is a real positive function and we just need to find the k tuples with highest T u values. However 
we allow uj to be a complex function in order to approximate other functions efficiently (see Section l5TI ). Depending 
on the actual function ui, we get different ranking functions with diverse behaviors. We illustrate some of these choices 
and relate them to prior ranking functional We omit the subscript u> if the context is clear. 

• If co(t, i) = 1, the result is the set of k tuples with the highest probabilities PP . 

• By setting u>(t, i) = score(t), we get the E-Score : 

T 0) = T,pw:tepw score ( t ) Pr (P w ) = score(t)Pr(t) = E(score(t)) 

• PRF^(h): One important class of ranking functions is when uj(t, i) = Wi (i.e., independent of f) and Wi = OVz > 
h for some positive integer h. This forms one of prevalent classes of ranking functions used in domains such as 
information retrieval and machine learning, with the weights typically learned from user preferences l22l l28l [3] 

m. 

• Two special cases of the PRF U function are: 

1. uj(i) = ( g' Q t herwise ' ^ We return ^ tu pl e s with highest T w (i) value, we have exactly the answer for 
PT(h). 

2. ujj(i) = ( q' Qfjjgjyyjgg f° r some 1 < .7 < k. We can see the tuple with largest T Wj value is the rank-j 
answer in U-Rank query ll34l . 

This allows us to compute the U-Rank answer by evaluating T UJj (t) for all t € T and j = 1, . . . , k. 



'The definition of the U-Top introduced in 1341 requires the retrieved k tuples belongs to a valid possible world. However, it is not required in 
our definition, and hence it is not possible to simulate U-Top using PRF. 
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• PRF e (a): Finally, we define PRF e to be a special case of the PRF^ function, where w, = co(i) = a\ where 
a is a constant and may be a real or a complex number. 

PRF U and PRF e form the two parameterized ranking functions that we propose in this work. Although PRF^ is 
the more natural ranking function and has been used elsewhere, PRF e is more suitable for ranking in probabilistic 
databases for various reasons. First, the features as we have defined above are not completely arbitrary, and the 
features Pr(r(t) = i) for small i are clearly more important than the ones for large i. Hence in most cases we would 
like the weight function, to(i), to be monotonically non-increasing. PRF e naturally captures this behavior (as long 
as |a| < 1). More importantly, we can compute the PRF e function in 0(nlog(n)) time (0(n) time if the dataset 
is pre-sorted by score) even for datasets with low degrees of correlations (i.e., modeled by and/xor trees with low 
heights). This makes it significantly more attractive for ranking over large datasets. 

Furthermore, ranking by PRF e (a), with suitably chosen a, can approximate rankings by many other functions 
reasonably well even with only real a. Finally, a linear combination of exponential functions, with complex bases, 
is known to be very expressive in representing other functions |4). We make use of this fact to approximate many 
ranking functions by linear combinations of a small number of PRF e functions, thus significantly speeding up the 
running time. We revisit this in Section[5j] 

4 Ranking Algorithms 

We next present an algorithm for efficiently ranking according to a PRF function. We first present the basic idea 
behind our algorithm assuming mutual independence, and then consider correlated tuples with correlations represented 
using an and/xor tree. We then present a very efficient algorithm for ranking using a PRF e function, and then briefly 
discuss how to handle attribute uncertainty. 

4.1 Assuming Tuple Independence 

First we show how the PRF function can be computed in 0(n 2 ) time for general weight function co, and for a given 
set of tuples T = {t%, . . . , t n }. In all our algorithms, we assume that co(t, i) can be computed in 0(1) time. Clearly it 
would be sufficient to compute Pr(r(i) = j) for any tuple t and 1 < j < n in 0(n 2 ) time. Given these values, we can 
directly compute the values of T(t) in 0(n 2 ) time. 

We first sort the tuples in a non-increasing order by their score function; assume t\, ■ ■ ■ , t n indicates this sorted 
order. Suppose now we want to compute Pr(r(U) = j). Let Tj = {ii,t2, ■ ■ ■ ,U} and <%; be an indicator variable 
that takes value 1 if ij is present, and if otherwise. Further, let a = (o"i, . . . , cr„) denote a vector containing all the 
indicator variables. Then, we can write Pr(r(tj) = j) as follows: 

Pr(r(*i) = j) = Pr{U) ]T Pr{pw) 

pw:\pwnTi — i | =j — 1 

= pr ^) e n pr ^) n a-p'ft)) 

i—l l<_i:(Ti—l lKi:cri—Q 

i=i 

The first equality says that tuple t{ ranks at the jth position if and only ti and exactly j — 1 tuples from Tj_i are 
present in the possible world. The second equality holds since the tuples are independent of each other. The naive 
method to evaluate the above formula by explicitly listing all possible worlds needs exponential time. Now, we turn to 
the generating function method based on which a polynomial time algorithm can be obtained. Recall the coefficient of 

x k in T{x) = n™=i( a i + b i x ) is S|/3|=fc ILift^o a « ILrft^i b * where (3 = (/3i, ...,/?„) and |/3| denote the number 
of l's in (3 ifTHl . Now consider the following generating function: 

r{x) = ( J] (l-Pr^ + Pr^-^j (Pr(U)-x) 
= 
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Figure 2: PRF computation on and/xor trees: the left figure corresponds to the database in Figure 1, whereas the right 
figure is the and/xor tree representation of the independent tuples in ExampleQ] 

We can see that the coefficient Cj of x J in the expansion of T % is exactly the probability that tj is at rank j, i.e., 
Pr(r(ti) = j). We note T % contains at most i + 1 nonzero terms. Expanding T % can be easily done in 0(i 2 ) time. 
This allows us to compute Pr(r(t,) = j) for ti in 0(i 2 ) time; T(tj), in turn, can be written as: 

T(t,-) = 2 "(J) ' PrM) =j) = J2 W 

j j 

which can be computed in 0(i 2 ) time. 

Example 1 Consider a relation with 3 independent tuples t\, t% tj, {already sorted according to the score function) 
with existence probabilities 0.5, 0.6, 0.4, respectively. The generating function for t% is: 

T*(x) = (.5 + .5x)(.4 + .6x)(Ax) = .2x + .5x 2 + .3x 3 

This gives us: 

Pr(r(t 3 ) = 1) = -2, Pr(r(t 3 ) = 2) = .5, Pr(r(t 3 ) = 3) = .3 

If we expand each T % for 1 < % < n from scratch, we need 0(n 2 ) time for each T % and 0(n 3 ) time in total. 
However, the expansion of T % can be obtained from the expansion of T l ~ l in 0{%) time as follows: 

r{x) = PttS)^ 1 ^! 1 " Pr{u - l] + Pr ^-^ x )- (2) 

This trick gives us a 0(n 2 ) time complexity. See Algorithm Q] for the pseudocode. Note that 0(n 2 ) time is asymp- 
totically optimal in general since the computation involves at least 0(n 2 ) probabilities, namely Pr(r(tj) = j) for all 
1 < i,j < n. 



Algorithm 1: IND-PRF-RANK(D T ) 

1 T°{x) = 1; 

2 for i = 1 to n do 

3 = P^T^ 1 (^)(l - Pr(U-i) + Pr(ti-i)x) ; 

4 Expand J- l (x) in the form of ■ c J x : ' ; 

5 |_ T(*i) = X)"=i u(U,j)cj ; 

6 return k tuples with largest T values; 



For some specific u functions, we may be able to achieve faster running time. For PRF u (h) functions, we only 
need to expand all JF 4 's up to x h term since u>(i) = for i > h. Then, the expansion from T l ~ 1 (x) to ^F l (x) only 
takes 0(h) time. This yields an 0(n ■ h + nlog(n)) time algorithm. We note the above technique also gives an 
0(nk + n log(n)) time algorithm for answering the U-Rank top-k query (all the needed probabilities can be computed 
in that time), thus matching the best known upper bound by Yi et al. Il36l (the original algorithm in P4l runs in 0(n 2 k) 
time). 
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We remark that the generating function technique can be seen as a variant of dynamic programming in some sense; 
however, using it explicitly in place of the obscure recursion formula gives us a much cleaner view and allow us to 
generalize it to handle more complicated tuple correlations. This also leads to an algorithm for extremely efficient 
evaluation of PRF e functions (Section POt . 



4.2 Probabilistic And/Xor Trees 

Next we generalize our algorithm to handle a correlated database where the correlations can be captured using an 
and/xor tree. As before, let T = {ti, £2, • • • , t n } denote the tuples sorted in an non-increasing order of their score 
function, and let Ti = {t\, t2, ■ ■ ■ , U}. Let T denote the and/xor tree that models the correlations. 

Suppose now we need to compute Pr(r(U) = j). Since we don't care about any tuple with a smaller score, it 
suffices to consider only %, the subtree of T induced by the leaf set Tj (namely, the union of all root-leaf paths with 
all leaves in Ti). Let Ch(v) = {v\, ■ ■ ■ , v{\ denote the set of v's children. Let p v = J2 V , eChM p( v ' Vh )- ^ or eacn 
node v s %, we define generating function J-~l(x, y) inductively as follows: 

x, v S Tj \ {U}; 

y, v = u. 



• If v is a leaf, TI (x,y) = 

• If v is a (V) node, 



v h ECh(v) 

• If v is a ® node, T l v {x, y) = U Vh eCh(v) T l h 0, v)- 
The generating function T' 1 for % is the generating function of its root. The following theorem [ 30 1 states the close 
relationship between the probabilities Pr(r(f,) = j) and the coefficients of T l . 

Theorem 1 H30V Let Cj be the coefficient of the term x^~ x y in the generating function T l (x, y) defined as above. We 
have that: Pr(r(t,) = j) = Cj. 

Proof: Suppose T l is rooted at r, n, . . . , are r's children and T ; 4 is the subtree rooted at 77. W.l.o.g, we assume 
ti G 7^ We denote by S l (Sf) the random set of leaves generated according to model T l (Tj 1 ). We write T l (x, y) = 
Y^j djxi + (%2j CjX^)y We prove by induction on the height of the and/xor tree the following claims: Pr(|5 4 | = 
j A U 6 S 1 )) = c.j and Pr( IS*' 1 1 = j A ti ^ S 1 )) = c'j We consider two case. If r is a ® node, we know from 
Definition!]] = U^Sf. Therefore, 



ft 

Pr(\S*\ = J A U 6 S*) = Pr(£ \St\ = j A € S*) 

1=1 

= E ( II Pr (i^i = m Pr (i^i = jh a u G si) 



Let J-\ = ^2,- ^ifx^ + c ijX^)y be the polynomial defined by T ; \ Now let us consider the coefficient Cj of the term 
x^~ x y in T l (x, y) — nf=i ^(x, y). From the Algorithmic we know the degree of y in T\ is for 1 < I < h — 1. 



So only the terms in T l h can contribute to the y terms in T % . Therefore, Cj = E^r h j,=j {Y['i=i c 'ij t J c ijh- Then it 
is easy to see, from induction hypothesis, the theorem follows. The other case when r is a @ node can be proved by 
simply observing Pr(|5*| = j A ^ € 5*') = P r ( | S 1 ^ | = j A ^ € Sl)p(r, 77J and the coefficient Cj of the term a^y in 
^ is p(r,r h )c hj . □ 



Example 2 77ie generating function J 75 for the left hand side tree in Figure]2\is (.6 + Ax)x(Ax + .6y) = .24a; 2 + 
.16a; 3 + .36xy + .2Ax 2 y. So we get that Pr(r(t5) = 3) = .24. From Figure\l\ we can also see Pr(r(<5) = 3) = 
Pf{pw2) + Pr(pw4) = .24. The right hand side of Figure^shows the probabilistic and/xor tree and the generating 
function computation for Example\l\ 
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Algorithm 2: ANDOR-PRF-RANK(T) 



T° = 0; 

for i=l to n do 

% = Ti-\ U the path from i, to the root; 

F{x,y)=GENE(T i ,t i ); 

Expand T % (x, y) in the form J2j c 'j x ^ + (Sj c j x ^ X )v'^ 

_ = E"=i w (*»ii) c j; 

return k tuples with largest T values; 
Subroutine: GENE(T,t); 
if T is a singleton node then 

if T = {£} then return y else return x 

else 

7^ is the subtree rooted at T{ for £ Ch(r); 
if r is a © noiie then 

L ^urn 1 - p + E rj6 o/i(r) P( r > r *) ' GENE(Ti, i); 
if r is a @ node then 



See Algorithm |2] for the pseudocode of the algorithm. 

If we expand T\ for each internal node v in a naive way (i.e., we do polynomial multiplication one by one), we 
can show the running time is 0(n 2 ) at each internal node and thus 0(n 3 ) overall. If we do divide-and-conquer at each 
internal node and apply FFT (Fast Fourier Transformation) for the multiplication of polynomials, the running time can 
be improved to 0(n 2 log 2 n). The details can be found in AppendixlAl 

4.3 Computing a PRF e Function 

Next we present an O (n log (n) ) algorithm to evaluate a PRF e function (the algorithm runs in linear time if the dataset 
is pre-sorted by score). If ui(i) = a 1 , then we observe that: 

n 

T(U) = p '(r(ti) = j)a? = ^(a) (3) 
i=i 

This surprisingly simple relationship suggests we don't have to expand the polynomials T l (x) at all; instead we can 
evaluate the numerical value of J 71 (a) directly. Again, we note that the value F(a) can be computed from the value 
of !F l ~ 1 (a) in O(l) time using Equation ©. Thus, we have 0(n) time algorithm to compute T(i 2 ) for all 1 < i < n 
if the tuples are pre-sorted. 

Example 3 Consider Example [7] and the P RF e function for t%. We choose uj(i) = .6*. Then, we can see that 
T 3 (x) = (.5 + .5x)(.4 + .6x)(Ax). So, T(t 3 ) = :F 3 (.6) = (.5 + .5 x .6) (.4 + .6 x .6)(.4 x .6) = .14592. 

We can use a similar idea to speed up the computation if the tuples are correlated and the correlations are represented 
using an and/xor tree. Suppose the generating function for % is J-~ l (x, y) = J2j c 'j x ^ + QZj c j xJ )v an d ^{U) = 
Y^ij=i °^ c i- We observe an intriguing relationship between the PRF e value and the generating function: 

t(u) = E^' = (E c X + (E^ rl ) a )-E c X 

3 3 3 

= J* (a, a) -F(a,0). 
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(i) w(i) = step function, L = 20 (ii) w(i) = step function (iii) w(i) =1000-i (i<=1000), =0 (i>1000) (iv) w(i) = an arbitrary smooth function 



Figure 3: Approximating functions using linear combinations of complex exponentials 

Given this, T(ti) can be computed in linear time by bottom up evaluation of T l (a, a) and T l {a, 0) in T\ If we 
simply repeat it n times, once for each ti, this gives us a 0(n 2 ) total running time. 

By carefully sharing the intermediate results among computations of Y(Aj), we can improve the running time to 
0(n log(n) + nd) where d is the height of the and/xor tree. We sketch this algorithm, which runs in iterations. Suppose 
the tuples are already pre-sorted by their scores. In iteration i, leaf ti (the i'th tuple in score order) is added to the tree 
and the computations are done along the path from ti to the root. Specifically, the algorithm maintains the following 
information in each inner node v. the numerical value of T l v (a, a) and ^(a, 0). The values on node v need to be 
updated when the value of one of its children changes. Therefore, in each iteration, the computation only happens on 
the path from ij to the root. Since we update at most d nodes for each newly added node, the running time is 0(nd). 
The updating rule for T l v (-, .)(both !Fl{ot, a) and T % v {a, 0)) in node v is as follows. We assume v's child, say u, just 
had its values changed. 

1. v is a ® node, J*(., .) <- J*~%, .J/^H-. 

2. v is a @ node, then: 

TH, ■) <- fi-K, •) +p(v, «)^(-. -)-p(v, u)FtH, ■) 

We note that, for the case of x-tuples, which can be represented using a two-level tree, this gives us an 0(nlog(n)) 
algorithm for ranking according to PRF e . 

4.4 Attribute Uncertainty or Uncertain Scores 

We briefly sketch how we can do ranking over tuples with discrete attribute uncertainty where the uncertain attributes 
are part of the tuple scoring function (if the uncertain attributes do not affect the tuple score, then they can be ignored 
for the ranking purposes). More generally, this approach can handle the case when there is a discrete probability 
distribution over the score of the tuple. 

The algorithm works by treating the alternatives of the tuples (with a separate alternative for each different possible 
score for the tuple) as different tuples, and adding an xor constraint over the alternatives. We can then use the algorithm 
for the probabilistic and/xor tree model to find the values of the PRF function for each resulting tuple separately. In 
a final step, we calculate the T score for each original tuple by adding the T scores of its alternatives If the original 
tuples were independent, the complexity of this algorithm is 0(n 2 ) for computing the PRF function, and 0(n log(n)) 
for computing the PRF e function where n is the size of the input, i.e, the total number of different possible scores. 



5 Approximating and Learning Ranking Functions 

In this section, we discuss how to choose the PRF functions and their parameters. Depending on the application 
domain and the scenarios, there are two approaches to this: (1) If we know the ranking function we would like to use 
(say PT(h)), then we can either simulate or approximate it using appropriate PRF functions; (2) If we are instead 
provided user preferences data, we can learn the parameters from them. Clearly, we would prefer to use a PRF e 
function, if possible, since it admits highly efficient ranking algorithms. For this purpose, we begin with presenting 
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an algorithm to find an approximation to an arbitrary PRF^ function using a linear combination of PRF e functions. 
We then discuss how to learn a PRF^ function from user preferences, and finally present an algorithm for learning a 
single PRF e function. 



5.1 Approximating PRF W using PRF e Functions 

A linear combination of complex exponential functions is known to be very expressive, and can approximate many 
other functions very well (4). Specifically, given a PRF^ function, if we can write ui(i) as: ui(i) f=a Yli=i u i a l> tnen 
we have that: 

T(i) = $>(i)Pr(r(i) = i) « ^ ( E a ' Pr ( r W = l ) ) 

i 1=1 \ i ) 

This reduces the computation of T(t) to L individual PRF e function computations, each of which only takes lin- 
ear time. This gives us an 0(nlog(n) + nL) time algorithm for approximately ranking using PRF U function for 
independent tuples (as opposed to 0(n 2 ) for exact ranking). 

Several techniques have been proposed for finding such approximations using complex exponentials 1261 141. Those 
techniques are however computationally inefficient, involving computation of the inverses of large matrices and the 
roots of polynomials of high orders, and may be numerically unstable. 

In this section, we present a clean and efficient algorithm, based on Discrete Fourier Transforms (DFT), for ap- 
proximating a function uj(i), that approaches to zero for large values of i As we noted earlier, this captures the typical 
behavior of the uj(i) function. An example of such a function is the step function (ui(i) = Vii < h, = OVi > h) which 
corresponds to the ranking function PT(h). At a high level, our algorithm starts with a DFT approximation of uj(i) 
and then adapts it by adding several damping, scaling and shifting factors. 

Discrete Fourier transformation (DFT) is a well known technique for representing a function as a linear combi- 
nation of complex exponentials (also called frequency domain representation). More specifically, a discrete function 
ui(i) defined on a finite domain [0, N — 1] can be decomposed into exactly N exponentials as: 



; W = Jj E MQe 3 ?* i = 0,...,N-l. (4) 



k=0 



where ^(0), ■ • • , ip(N — 1) denotes the DFT transform of ui(0), ■ ■ ■ , ui(N — 1). If we want to approximate to by fewer, 
say L, exponentials, we can instead use the L DFT coefficients with maximum absolute value. For clarity, we assume 
that V'(O), • ■ ■ , ip(L — 1) are those coefficients. Then our approximation 0jj^ FT of ui by L exponentials is given by: 



(5) 



fc=0 



However, DFT utilizes only complex exponentials of unit norm, i.e., e jr (where r is a real), which makes this 
approximation periodic (with a period of N). If we make N sufficiently large, say larger than the total number 
of tuples, then we usually need a large number of exponentials (L) to get a reasonable approximation. Moreover, 
computing DFT for very large N is computationally non-trivial. Furthermore, the number of tuples n may not be 
known in advance. 

We next present a set of nontrivial tricks to adapt the base DFT approximation to overcome these shortcomings. 

1, i < N 



To illustrate our method, we use the step function w(i) = | q' ' > N aS ° Ur ml ™ n S exam pl e t0 show our method 

and the specific shortcoming it addresses. We assume oj(i) takes none-zero values within interval [0, N — 1] and the 
absolute values of both oj(i) and uj® FT (i) are bounded by B. 

1. (DFT) We perform pure DFT on the domain [1, aN], where a is a small integer constant (typically < 10). 
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2. (Damping Factor (DF)) We introduce a damping factor 77 < 1 such that Bif N < e where e is a small positive 
real (for example, 10~ 5 ). Our new approximation becomes: 

^f T+DF (i) = . *r T {i) - - E WkXve^y. (6) 

By incorporating this damping factor, we have that 

lim^+oo wf F T+DF (i) = 0. Especially, c^ 7 ™ (*) < e for i > «7V. 

3. (Initial Scaling (IS)) Use of the damping factor gives a biased approximation when i is small (see FigureQi)). 
Taking the step function as an example, oj® f T+DF (i) is approximately if for < i < N instead of 1. To 
rectify this, we initially perform DFT on a different sequence uj(i) = rj~ l uj(i) (rather than ui(i)) on domain 
£ [0, aN]. This gives us an unbiased approximation, which we denote by <~ ! DFT + DF + I s _ 

4. (Extending and Shifting (ES)) This trick is in particular tailored for optimizing the approximation performance 
for ranking functions. DFT does not perform well at discontinuous points, specifically at i = (the left bound- 
ary), which can significantly affect the ranking approximation. To handle this, we extrapolate u) to make it con- 
tinuous around 0. Let the resulting function be Q which is defined on [— bN, +00] for small ^0. Again, taking 

the step function for example, u)(i) = | q' • >^/y _ * < ^' Then, we shift cj(i) rightwards by bN to make 

its domain lie entirely in positive axis, do initial scaling and perform DFT on the resulting sequence. We denote 
the approximation of the resulting sequence by Cj'(i)(by performing (O). For the approximation of original u(i) 
values, we only need to do corresponding leftward shifting , namely i~, DFT+DF+IS+ES (i) = 6j'(i + bN). We 
can see from Figure |3ji) that DFT+DF+IS+ES produces a much better approximation than others around i = 0. 

Figures [3ji) and (ii) illustrate the efficacy of our approximation technique for the step function. As we can see, 
we are able to approximate that function very well with just 20 or 30 coefficients. Figure Oiii) and (iv) show the 
approximations for a piecewise linear function and an arbitrarily generated continuous function respectively, both of 
which are much easier to approximate than the step function. 



5.2 Learning a PRF" or PRF e Function 

Next we address the question of how to learn the weights of a PRF U function or the a for a single PRF e function 
from user preferences. To learn a linear combination of PRF e functions, we first learn a PRF^ function and then 
approximate it as above. 

Prior work on learning ranking functions (e.g., ll22ll28l l5l [T0l ) assumes that the user preferences are provided in 
the form of a set of pairs of tuples, and for each pair, we are told which tuple is ranked higher. Our problem differs 
slightly from this prior work in that, the features that we use to rank the tuples (i.e., Pr(r(t) = = 1, . . . , n) 
cannot be computed for each tuple individually, but must be computed for the entire dataset (since the features for a 
tuple depend on the other tuples in the dataset). Hence, we assume that we are instead given a small sample of the 
tuples, and the user ranking for all those tuples. We compute the features assuming this sample constitutes the entire 
relation, and learn a ranking function accordingly, with the goal to find the parameters (the weights Wi for PRF^ or 
the parameter a for PRF e ) that minimizes the number of disagreements with the provided ranking over the samples. 

Given this, the problem of learning PRF" is identical to the problem addressed in the prior work, and we utilize 
the algorithm based on support vector machines (SVM) l28l in our experiments. 

On the other hand, we are not aware of any work that has addressed learning a ranking function like PRF e . 
We use a simple binary search-like heuristic to find the optimal real value of a that minimizes the Kendall dis- 
tance between the user-specified ranking and the ranking according to PRF e (a). In other words, we try to find 
arg min Qe [ ,i] (dis(er, a (a))) where dis() is the Kendall distance between two rankings, a is the ranking for the given 
sample and cr(a) is the one obtained by using PRF e (a) function. Suppose we want to find the optimal a within the 
interval [L, U] now. We first compute dis(<7, <r(L + i ■ ~ ) for i = 1, . . . , 9 and find i for which the distance is the 
smallest. Then we reduce our search range to [max(X, L + (i — 1) • ~ , min (77, L + (i + 1) • and repeat 

the above recursively. Although this algorithm can only converge to a local minimum, in our experimental study, 
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we observed that all of the prior ranking functions exhibit a uni- valley behavior (Section |7J, and in such cases, this 
algorithm finds the global optimal. 

6 PRF Computation for Arbitrary Correlations 

Among many models for capturing the correlations in a probabilistic database, graphical models (Markov or Bayesian 
networks) perhaps represent the most systematic approach ||33ll . The appeal of graphical models stems both from 
the pictorial representation of the dependencies, and a rich literature on doing inference over them. In this section, 
we extend our generating function-based algorithm for computing PRF to handle correlations represented using a 
graphical model. The resulting algorithm is a non-trivial dynamic program over the junction tree of the graphical 
model, combined with the generating function method. Our main result is that we can compute the PRF function in 
polynomial time if the junction tree of the graphical model has bounded treewidth. It is worth noting that this result can 
not subsume our algorithm for and/xor trees (Section |4~2l i since the treewidth of the moralized graph of a probabilistic 
and/xor tree may not be bounded. 

Definitions: We start with briefly reviewing some notations and definitions related to graphical models and junction 
trees. Let T — {ti, t2, ■ ■ ■ , t n } be the set of tuples in Dt, sorted in an non-increasing order of their score values. 
For each tuple t in T, we associate an indicator random variable X t , which is 1, if t is present and otherwise. 
Let X = {X tl , . . . ,X tn } and Xi = {X tl , . . . ,X ti }. The correlations among these variables may be represented 
using either a directed or an undirected graphical model; we however assume that we are provided with an equivalent 
junction tree over the variables (which can be constructed using standard algorithms lfl5l ). 

Let T be a tree with each node v associated with a subset C v C X. We say T is a junction tree if any intersection 
C u n C v for any u, v g T is contained in C w for every node w on the unique path between u and v in T (this is 
called the running intersection property). The treewidth of a junction tree is defined to be max„ e j \C V \ — 1. Denote 
S u .v = C v n C u for each (u, v) € T. We call S u , v a separator since the removal of S uv disconnects the graphical 
model. 

Example 4 Let T = {t%, ti, £3, £4, £5}. Figure^\(i) and (ii) show the (undirected) graphical model and the corre- 
sponding junction tree T.T has four nodes vi, «2, V3, U4. C Vl = {X ti , X t5 }, C V2 = {X ti , X ti }, C V2 = {X ts , X tl } 
and C Vi — {X t3 , X t2 }. The treewidth ofT is 1. We have, S VliV2 = {^4}, S V2iV3 = {X3} and S V2tVi = {^3}. 

We associate each clique C v (and each separator S u . v ) with a potential ir v (C v ) (resp. n u ,v(S u ,v)), which is a 
function over all variables Xt i € C v (Xt i G S u ,v) and represents the correlations among those variables. Without loss 
of generality, we assume that the potentials are calibrated, that is, the potential corresponding to a clique (separator) 
is exactly the joint probability distribution over the variables in that clique (separator). Given a junction tree with 
arbitrary potentials, calibrated potentials can be computed using the message passing algorithm 031 . 

For a set of variables S, we use Pr(5) to denote the joint probability distribution over those variables. Then 
the joint probability distribution of X, whose correlations can be captured using a calibrated junction tree T, can be 
written as: 

Pr(Ar) = n„ £ wa) = rUrPr(a) 

Algorithm Sketch: Our dynamic programming-based algorithm computes Pr(r(t,) = h) given any ti E T, for all 
1 < h < n, in polynomial time if the treewidth of T is bounded by a constant. To compute T(tj) for all i, we need to 
run the algorithm n times, once for each ti. 

The algorithm begins by rooting T at a node root such that X ti 6 C roo t. The dynamic program then runs bottom 
up, from the leaves to the root of the junction tree T. Let T v denote the subtree rooted at a node v in the junction 
tree. For each node v in T, we maintain a polynomial number of states (which we call configurations). Each state 
represents a random event of the subtree T v . In each step, we compute the probability of one state of node v based on 
the probabilities of the states of v's children which we assume have been already computed. By aggregating a constant 
number of probabilities of the root's states, we can get Pr(r(£j) = h). 
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Figure 4: (i) A graphical model; (ii) A junction tree for the model along with the (calibrated) potentials. 

Specifically, for each such node, we recursively compute the following quantities: 

Pr(£«) = Pr((d v ,0D), Va v e {0, 1}I C -I,V0 < Q\ < n 

which is the probability that the variables in C v take the values indicated by the boolean vector a v (called a configu- 
ration) and the number of variables in T v n Xi is exactly equal to l v . 

After computing all of these values using dynamic programming, we can compute Pr(r(ij) = h),Vh, as: 

Pr(r(t0 = h) = £ e ]aa=h Prfet) 

^root : ^root[^i] — 

In other words, we compute the total probability that X tl = 1, and that exactly h variables in Xi are equal to 1 (i.e., 
exactly h — 1 tuples ranked above i, are present in the possible world). 

Given the above framework, we can construct a recursive formula for computing Pr(/C i „) from (1) the computed 
values for the children of the node v, and (2) the joint probability distributions corresponding to the clique and its 
children. However it is not computationally feasible to evaluate that recursion formula directly. Instead we develop a 
generating functions-based algorithm for that purpose, which allows to efficiently compute Pr(X* ) for all nodes v in 
the junction tree. 

Example 5 Let T = {ti, £2, £3, £4, £5}- Figures^\(i) and (ii) show an undirected graphical model over the corre- 
sponding variables and a calibrated junction tree T over them. Suppose we want to compute Pr(r(ts) = 3). We root 
the junction tree at the clique v\. Further, assume score(ii) > scored) > . . . > scorers). 

Then, to compute Pr(r(t§) = 3), the valid configurations for the node V\ are k\ = ((1, 0), 3) and k,2 = ({1, 1), 3) 
(the first index corresponds to Xt 5 , the second to Xt t ), giving us: 

Pr(r(* s ) = 3) = Pr(^ = ((1,1), 3)) + Pr(^ = ((1,0), 3)) 

The first term on the r.h.s. corresponds to the probability that both X t& and X ti are set to 1, and exactly one other 
term in X tl , . . . , X t5 is set to 1. Whereas the second term corresponds to X t5 — 1, X ti = 0, and exactly two of 
X t3 , X t2 , X tl set to 1. □ 




Algorithm Details: We begin with some notations first. Let T v be the subtree rooted at the vertex v € T. For a subset 
S <Z X, let as & {0, l}'" 5 ' be a boolean vector of size |5| which we call the existential configuration of S. Essentially, 
as represent an outcome of the random variables in S. For ease of notation, we write a v ,a u . v and a^ in place of 
a q v , as u „ and a UueT c u , respectively. Let \a\ be the number of l's in vector a. We define a s to be the restriction of 
as which only contains entries in S n Xi. We say two existential configurations as 1 and as 2 are consistent with each 
other if they agree with each other on their intersection Si n S2, denoted by as 1 ~ as 2 ■ 

Suppose vi , V2, ■ ■ ■ , Vx be the children of v in T. Before providing the recursion formula, we need the following 
simple fact, which follows from the running intersection property of junction tree. 
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Xi 


{X tl , . . . , x ti } 


C v 


C X, for each node v € T 




C u n C v , for each edge (u, v) G T 


J-v 


oUDLTcc lUULcU dl t/ t J 


as 


{0, for 5 C X. 




crg^i] = 1 indicates X ti = 1 for X 4j G 5 


*i 


restriction of erg on entries in 5 fl 


8 V 


# nodes in 7^, 







Table 2: Some Useful Notations 



Lemma 1 1. If<r Vj ~ <t„ ant/ cr„ fc ~ <x„ /« T, f/zen 5*„j ~ <x„ fc . 
2. /fcr^ ~ Ou, f/zen ctt„ ~ cr„. 

Proof: Both parts follow directly from the running intersection property of junction trees. More specifically, the first 
part can be seen from the fact that G Vi n C v . C C v and the second holds because C u n C7„ C for any u G T Vj . □ 

We use shorthand notation [a Vj } to denote the set a Vl , . . . , a Vl , . ] to denote 9 Vl , . . . , 9 Vl and [/C„ . ] to denote 
IC V1 , . . . ,K Vl . We say IC V is consistent with [K. Vj ], denoted by JC V ~ [AC^.J, if there exists a configuration 177; such 
that <tx„ ~ CTu ~ [vvj], \&t v \ = ^« an< ^ i* s restriction on T^., denoted by satisfies |cr^ | = 6^ for all j. The 
following lemma characterizes the condition when such consistent configurations can be obtained. We use this lemma 
for the construction of generating functions. 

Lemma 2 [K Vj ] ~ K v if and only if[a Vj ] ~ a v and \a l v \ - £\ =1 |cr* | = 9 V - J2j=i ^ 

Proof: Note that 0„ = |(U^- =1 ^.U^)| and^ =1 B Vj = £)J =1 \o* Vj \- Thus we only need to prove, |(u^ =1 ^Uer*)| = 

\&v- 1 + \&v \ ~ Sj=i \Ov- n ^til' f° r h = I. We prove this by induction on h. For /i = 0, it holds trivially. Let 
us assume, it holds for values up to h. We have, 



Ku^.u^ihcu^, 




1 + K +1 


- I((u£ 


=i< u^)n^ +1 )| 


= EK-I- 


-1^1 


-XX 

i=i 


n^l + 




h+l 

i=i 




- E ft, 







The first equality follows from inclusion-exclusion principle and the second from the induction hypothesis and the 
running intersection property. So, the claim holds. □ 

The recursion formula of dynamic programming is given by 

Pr(£„) = ; ^ ■ J2 II Pr (^) C7) 

The following theorem establishes the correctness of the computed recursion for dynamic programming. 
Theorem 2 The recursion formula correctly computes the probability 

Pr{t v )= Yl Pr (^)- 
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Proof: We prove this theorem by induction on the tree structure. Assume it holds for all v's children, i.e, 

Pr(£„,)= £ Pr(a T „.). 

3 

for all j,GT v and 9 Vj . Then, plugging them into the recursion, we get 



Pr(£„) 



7r(CT„) 



e n pr (^) 



3=1 



7t(ct„) 



e n e p ^,) 



E 



E 



/ 

V 3=1 



= Pr(o'T„ ) I ; see explanation below 

E Pr (^) 

I P{?T V . ) 

The last equality holds because of the definition of [JC Vj ] ~ /w,. Now, we prove the fourth equality ir(a v ) Ylj—i ~ r^ 2 — 5 

Pr(tTT^) for a v ~ [cyA Since C„ is a separator of different T v s, so by local Markov assumptions (T Vl ±T v ._L . . . |Ci), 
i.e, T Wl , T„ 2 , ... are conditionally independent of each other given C 2 ;. Written in probability form, it is Pr(aT v |cr„) = 
II 7=1 P r (^T„ . Ifft))- Therefore, for any ctt„ such that <tt„ ~ [<7t„ ]> we have 



Pr(5-T„ 



Pr 



' ' Pr(a T „.) ' Pr(a T „.) 
.o-Tjo-„)PrK) = 7r(<r„) _[_[ Pr(cr T = 7T(5- P ) [[ 3 = tt(<t v ) [ [ — 



□ 



Example 6 Suppose we are considering node vi and want to compute Pr(jC V2 ((1, 0), 2)^. Using Lemma\2\ we know 
consistent configurations for node V3 and U4 are ((0, 0), 0), ((0, 1), 1) and ((0, 1), 1), ((0, 0), 0). Thus, 



Pr(^ 2 «l,0),2) 
Pr(X 4 = 1 A X 3 = 0) 



Pr(X 3 = 0) 2 
+Pr(^ 3 ((0,l},l)Pr(^ 4 ((0,0),0) 
0.2 



Pr(K„ 3 ((0,0),0)Pr(K. U4 (<0,l),l) 



0.4 2 



(0.1 x 0.3 + 0.1 x 0.3) = 0.075 



□ 
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Approximating with PRF-e (a=l-0.9 A i): (i) IIP-10O0O0, k=100; (ii) Syn-IND-1000, k=100 (iii) Approximating PT(1000)-1000 (n=100000) (iv) No. of Terms vs Approximation Quality 



Figure 5: (i, ii) Comparing PRF e with other ranking functions for varying values of a; (iii) Approximating PT(1000) 
using a linear combination of PRF e functions; (iv) Approximation quality for three ranking functions for varying 
number of exponentials. 

However, the number of different possible [JC Vj \ consistent with [JC V ] may be exponential. So it is still not clear 
how (0 can be evaluated in polynomial time. Suppose we have already computed Pr(/C llj ) for all Vj and all different 
possible configurations of JC Vj and now we want to compute Pr{K, v ) for a fixed K, v = (a v ,8 v ). Note that when 
ICv = (&v,@v) is fixed, a VyVj becomes fixed Vj = 1, 2, .., I, since a VyVj contains the configurations of only a subset 
of bits in a v . Thus, I = 8 V — \a l v \ + Ylj=i 1°% v- 1 ( see Lemma |2]i does not depend on the actual values of [a Vj ] if 
[a v ] ~ a v . So we can treat / as a constant. Similarly, the term — — "A 5 *) i s also fixed. 

3=1 

Thus we can rewrite, 

E ri^)= E E lW.) = E rif E pr(^))= E IK ,,. 

where a v . a = J^- - Pr{JC v . ). We know Vj = 1, . . . , I \a v . | < tw. So if the treewidth tw is a constant, there 
are at most 2 tw (also a constant) different a v s. Thus, each a z can be computed in constant time for all 1 < j < I 
and < 8 Vj < n. Consider the following generating function: 

j=l \k=0 / 

It is not hard to see that the coefficient of x 0v is exactly equal to ]~/c n'=i P r (K-vj )■ Therefore, we only have 

to evaluate a v . ^ for all Vj and fe, and then construct and expand T(x) appropriately. 

Running Time: We need to run our dynamic program n times for each tuple ij. In the dynamic program, we need 
to construct and expand 0(2^ Cv ^) generating functions for each node v in T, since one generating function is able to 
generate Pr(^(d- V ,8 v )j for a fixed a v and all 1 < 8 V < n and we have 2' Cl, l different a v values. To construct and 

expand a generating function at node v takes 0(deg(v)(n2^ Cv i ' + n 2 )) time. Specifically, computing each a„.,j takes 
0(2'^ ') and expanding the polynomial takes 0(deg(v)n 2 ) time. Therefore, the time complexity is 0(2 tw (n2 tw + 
?i 2 )|T|) for each execution of dynamic program, resulting in an overall time complexity of 0(2 tw n 2 (2 tw + n)\T\), 
where tw is the treewidth of the junction tree T. 

7 Experimental Study 

We conducted an extensive empirical study over several real and synthetic datasets to illustrate: (a) the diverse and 
conflicting behavior of different ranking functions proposed in the prior literature, (b) the effectiveness of our param- 
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eterized ranking functions, especially PRF e , at approximating other ranking functions, and (c) the scalability of our 
new generating functions-based algorithms for exact and approximate ranking. We discussed the results supporting 
(a) in Section [3~!2l In this section, we focus on (b) and (c). 

Datasets: We mainly use the International Ice Patrol (IIP) Iceberg Sighting Datasej^for our experiments. This dataset 
was also used in prior work on ranking in probabilistic databases l27l l23l . The database contains a set of iceberg 
sighting records, each of which contains the location (latitude, longitude) of the iceberg, and the number of days the 
iceberg has drifted, among other attributes. Detecting the icebergs that have been drifting for long periods is crucial, 
and hence we use the number of days drifted as the ranking score. The sighting record is also associated with a 
confidence-level attribute according to the source of sighting: R/V (radar and visual), VIS (visual only), RAD (radar 
only), SAT-LOW (low earth orbit satellite), SAT-MED (medium earth orbit satellite), SAT-HIGH (high earth orbit 
satellite), and EST (estimated). We converted these six confidence levels into probabilities 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 
and 0.4 respectively. We added a very small Gaussian noise to each probability so that ties could be broken. There 
are nearly a million records available from 1960 to 2007; we created 10 different datasets for our experimental study 
containing 100,000 (IIP-100,000) to 1,000,000 (IIP- 1,000,000) records, by uniformly sampling from the original 
dataset. 

Along with the real datasets, we also use several synthetic datasets with varying degrees of correlations, where 
the correlations are captured using probabilistic and/xor trees. The tuple scores (for ranking) were chosen uniformly 
at random from [0, 10000]. The corresponding and/xor trees were also generated randomly starting with the root, by 
controlling the height (L), the maximum degree of the non-root nodes (d), and the proportion of® and ® nodes (X/A) 
in the tree. Specifically, we use five such datasets: 

(1) Syn-IND (independent tuples), (2) Syn-XOR (L=2,X/A=oo,d=5), (3) Syn-LOW (L=3,X/A=10,d=2), (4) Syn-MED 
(L=5,X/A=3,d=5), and (5) Syn-HIGH (L=5,X/A=l,d=10). 

For Syn-IND, the tuple existence probabilities were chosen uniformly at random from [0, 1]. Note that the Syn-XOR 
dataset, with height set to 2 and no ® nodes, exhibits only mutual exclusivity correlations (mimicking the x-tuples 
model Il32ll36l ), whereas the latter three datasets exhibit increasingly more complex correlations. 

Setup: We use the normalized Kendall distance (Section [3.2l ) for comparing two top-k rankings. All the algorithms 
were implemented in C++, and the experiments were run on a 2.4GHz Linux PC with 2GB memory. 

7.1 Approximability of Ranking Functions 

We begin with a set of experiments illustrating the effectiveness of our parameterized ranking functions at approxi- 
mating other ranking functions. Due to space constraints, we focus on PRF e here because it is significantly faster to 
rank according to a PRF e function (or a linear combination of several PRF e functions) than it is to rank according 
a PRF" function. 

Figures [5] (i) and (ii) show the Kendall distance between the Top- 100 answers computed using a specific ranking 
function and PRF e for varying values of a, for the IIP-100,000 and Syn-IND-1000 datasets. For better visualization, 
we plot i on the x-axis, where a = 1 — 0.9*. The reason behind this is that the behavior of the PRF e function changes 
rather drastically, and spans a spectrum of rankings, when a approaches 1. First, as we can see, the PRF e ranking is 
close to ranking by Score alone for small values of a, whereas it is close to the ranking by Probability when a is close 
to 1 (in fact, for a = 1, the PRF e ranking is equivalent to the ranking of tuples by their existence probabilities!! 
Second, we see that, for all other functions (E-Score, PT(h), U-Rank, Exp-Rank), there exists a value of a for which 
the distance of that function to PRF e is very small, indicating that PRF e can indeed approximate those functions 
quite well. Moreover we observe that this "uni-valley" behavior of the curves justifies the binary search algorithm we 
advocate for learning the value of a in Section [BT2l Our experiments with other synthetic and real datasets indicated a 
very similar behavior by the ranking functions. 

Next we evaluate the effectiveness of our approximation technique presented in Section [5T1 In Figure [5] (iii), we 
show the Kendall distance between the top-k answers obtained using PT(h) (for h = 1000, k = 1000) and using 
a linear combination of PRF e functions found by our algorithms. As expected, the approximation using the vanilla 

2 |http://nsidc.org /data/g008()7.html 

3 On the other hand, for a = 0, PRF e ranks the tuples by their probabilities to be the Top-1 answer. 
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# Samples # Samples alpha Dataset 

(i) Learning PRF-e (n-100000,k-100) (ii) Learning PRF-w (n-IOOOOO,k=100) (iii) Effect of correlation (iv) Effect of correlation 



Figure 6: (i) Learning PRF e from user preferences; (ii) Learning PRF U from user preferences; (iii) Effect of corre- 
lations on PRF e ranking as a varies; (iv) Effect of correlations on PRF e , U-Rank and PT(h). 
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Figure 7: Experiments comparing the execution times of the ranking algorithms (note that the y-axis is log-scale for 
(ii) and (iii)) 



DFT technique is very bad, with the Kendall distance close to 0.8 indicating little similarity between the top-k answers. 
However, the approximation obtained using our proposed algorithm (indicated by DFT+DF+IS+ES curve) achieves a 
Kendall distance of less than 0.1 with just L = 20 exponentials. 

In Figure|5](iv), we compare the approximation quality (found by our algorithm DFT+DF+IS+ES) for three ranking 
functions for two datasets: IIP-100,000 with k = 100, and IIP-1,000,000 dataset with k = 10000. The ranking 
functions we compared were: (1) PT(h) (h = 1000), (2) an arbitrary smooth function, sfunc, (see Figure[3jiv)), and 
(3) a linear function (Figure[3jiii)). We see that L = 40 suffices to bring the Kendall distance to < 0.1 in all cases. We 
also observe that smooth functions (for which the absolute value of the first derivative of the underlying continuous 
function is bounded by a small value) are usually easier to approximate. We only need L = 20 exponentials to achieve 
a Kendall distance less than 0.05 for sfunc. The Linear function is even easier to approximate. We also tested a few 
other continuous functions such as piecewise linear function and f(x) = 1/x, and found similar behavior. We omit 
those curves for clarity. 

7.2 Learning Ranking Functions 

Next we consider the issue of learning ranking functions from user preferences. Lacking real user preference data, 
we instead assume that the user ranking function, denoted user-func, is identical to one of: E-Score, PT(h), U-Rank, 
Exp-Rank, or PRF e (a = 0.95). We generate a set of user preferences by ranking a random sample of the dataset 
using user-func (thus generating five sets of user preferences). These are then fed to the learning algorithm, and finally 
we compare the Kendall distance between the learned ranking and the true ranking for the entire dataset. 

In Figure|6ji), we plot the results for learning a single PRF e function (i.e., for learning the value of a) using the 
binary search-like algorithm presented in Section 15.21 The experiment reveals that when the underlying ranking is 
done by PRF e , the value of a can be learned perfectly. When one of PT(h) or U-Rank is the underlying ranking 
function, the correct value a can be learned with a fairly small sample size, and increasing the number of samples does 
not help in finding a better a. On the other hand, Exp-Rank cannot be learned well by PRF e unless the sample size 
approaches the total size of whole dataset. This phenomenon can be partly explained using Figure |3i) in which the 
curves for PT(h) and U-Top have a fairly smooth valley, while the one for Exp-Rank is very sharp and the region of a 
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values where the distance is low is extremely small ([1 — 0.9 90 , 1 — 0.9 110 ]). Hence, the minimum point for Exp-Rank 
is harder to reach. Another reason is that Exp-Rank is quite sensitive to the size of the dataset, which makes it hard 
to learn it using a smaller-sized sample dataset. We also observe that while extremely large samples are able to learn 
E-Score well, the behavior of E-Score is quite unstable when the sample size is smaller. 

Note that if we already know the form of the ranking function, we don't need to learn it in this fashion; we can 
instead directly find an approximation for it using our DFT-based algorithm. 

In Figure [6] (ii), we show the results of an experiment where we tried to learn a PRF U function (using the SVM- 
lite package l28l ). We keep our sample size < 200 since SVM-lite becomes drastically slow with larger sample sizes. 
First we observe that PT(h) and PRF e can be learned very well from a small size sample (distance < 0.2 in most 
cases) and increasing the sample size does not benefit significantly. U-Rank can also be learned, but the approximation 
isn't nearly as good. This is because U-Rank can not be written as a single PRF^ function. We observed similar 
behavior in our experiments with other datasets. Due to space constraints, we omit a further discussion on learning a 
PRF^ function; the issues in learning such weighted functions have been investigated in prior literature, and if the 
true ranking function can be written as a PRF U function, then the above algorithm is expected to learn it well given 
a reasonable number of samples. 

7.3 Effect of Correlations 

Next we evaluate the behavior of ranking functions over probabilistic datasets modeled using probabilistic and/xor 
trees. We use the four synthetic correlated datasets, Syn-XOR, Syn-LOW, Syn-MED, and Syn-HIGH, for these ex- 
periments. For each dataset and each ranking function considered, we compute the rankings by considering the 
correlations, and by ignoring the correlations, and then compute the Kendall distance between these two (e.g., for 
PRF e , we compute the rankings using PROB-ANDOR-PRF-RANK and IND-PRF-RANK algorithms). Figure 
H|iii) shows the results for the PRF e ranking function for varying a, whereas in Figure Ifjiv), we plot the results for 
PRF e (a = 0.9), PT(IOO), and U-Rank. As we can see, on highly correlated datasets, ignoring the correlations can 
result in significantly inaccurate top-k answers. This is not as pronounced for the Syn-XOR dataset. This is because, 
in any group of tuples that are mutually exclusive, there are typically only a few tuples that may have sufficiently high 
probabilities to be part of the top-k answer; the rest of the tuples may be ignored for ranking purposes. Because of 
this, assuming tuples to be independent of each other does not result in significant errors. As a approaches 1, PRF e 
tends to sort the tuples by probabilities, so all four curves in Figure[6jiii) become close to 0. We note that ranking by 
E-Score is invariant to the correlations, which is a significant drawback of that function. 

7.4 Execution Times 

Figure |7ji) shows the execution times for four ranking functions: PRF e , PT(h), U-Rank and Exp-Rank, for the 
IlP-datasets, for different dataset sizes and k. We note that the running time for PRF U is similar to that of PT(h). 
As expected, ranking by PRF e or Exp-Rank is very efficient (1000000 tuples can be ranked within 1 or 2 seconds). 
Indeed, after sorting the dataset in an non-decreasing score order, PRF e needs only a single scan of the dataset, and 
Exp-Rank needs to scan the dataset twice. Execution times for PT(h) and U-Rank-k increase linearly with h and k 
respectively and the algorithms become very slow for high h and k. The running times of both PRF e and Exp-Rank 
are not significantly affected by k. 

Figure |7Jii) compares the execution time for PT(h) and its approximation using a linear combination of PRF e 
functions (see Figure |5jiii)), for two different values of k. w50 indicates that 50 exponentials were used in the 
approximation (note that the approximate ranking, based on PRF e , is insensitive to the value of k). As we can see, for 
large datasets and for higher values of k, exact computation takes several orders of magnitude more time to compute 
than the approximation. For example, the exact algorithm takes nearly 1 hour for n = 500, 000 and h = 10, 000 
while the approximate answer obtained using L = 50 PRF e functions takes only 24 seconds and achieves a Kendall 
distance 0.09. 

For correlated datasets, the effect is even more pronounced. In Figure lltiii), we plot the results of a similar 
experiment, but using two correlated datasets: Syn-XOR and Syn-HIGH. Note that the number of tuples in these 
datasets is smaller by a factor of 10. As we can see, our generating functions-based algorithms for computing PRF e 
are highly efficient, even for datasets with high degrees of correlation. As above, approximation of the PT{h) ranking 
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function using a linear combination of PRF e functions is significantly cheaper to compute than using the exact 
algorithm. 

Combined with the previous results illustrating that a linear combination of PRF e functions can approximate 
other ranking functions very well, this validates the unified ranking approach that we propose in this paper. 

8 Conclusions 

In this paper we presented a unified framework for ranking over probabilistic databases, and presented several novel 
and highly efficient algorithms for answering top-k queries. Considering the complex interplay between probabilities 
and scores, instead of proposing a specific ranking function, we propose using two parameterized ranking functions, 
called PRF U and PRF e , which allow the user to control the tuples that appear in the top-k answers. We developed 
novel algorithms for evaluating these ranking functions over large, possibly correlated, probabilistic datasets. We 
also developed an approach for approximating a ranking function using a linear combination of PRF e functions 
thus enabling highly efficient, albeit approximate computation, and also for learning a ranking function from user 
preferences. Our work opens up many avenues for further research that we are planning to pursue. For instance, 
there may be other non-trivial subclasses of PRF functions, aside from PRF e , that can be computed very efficiently. 
Understanding the behavior of various ranking functions and their relationships across probabilistic databases with 
diverse uncertainties and correlation structures also remains an important open problem in this area. 
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A Multiplication of a Set of Polynomials 



Given a set of polynomials Pj = X)j>o c n x ^ f° r 1 < * < fc, we want to compute the multiplication P = Jli=i Pi 
written in the form P = J2j>o c i x " '■ ^ et < ^(^ > ) ^ e tne degree of the polynomial P. 

First we note that the naive method (multiply P^s one by one) gives us a 0(n 2 ) time algorithm by simple counting 
argument. Let Pj = 11^=1 Pj- ^ i s eas y to see <^(P) — 2}=i d(Pi). So the time to multiply Pj and Pj+i is 
0(d(Pi) ■ d(Pj+i)). Then, we can see the total time complexity is 

fc-i fe-i 

0(d(Pi) ■ d(P l+1 )) = 0(n) ■ £ d(P +1 ) = 0(n 2 ). 
i=i i=i 

Now, we show how to use divide-and-conquer and FFT(Fast Fourier Transformation) to achieve a 0(n log 2 n) 
time algorithm. It is well known that the multiplication of two polynomial of degree 0{n) can be done in 0{n log n) 
time using FFT The divide-and-conquer algorithm is as follows: If there exists any Pi such that rf(Pj) > \d(P), we 
evaluate Ilj j^i P* recursively and then multiply it with Pj using FFT. If not, we partition all PjS into two sets 5*1 and 
,!?2 such that ^d(P) < d(Y[ ieS . Pi) < |d(P). Then we evaluate Si and S2 separately and multiply them together 
using FFT. It is easy to see the time complexity of the algorithm running on input size n satisfies 

2 

T{n) < max{T(-ra) + 0(n log n),T(m) +T(n 2 ) + O(nlogn)} 
where ni+n 2 = n and in < rii < n 2 < |n. By solving the above recursive formula, we know Tin) = 0(n log 2 n). 
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